High-dimensional crowdsourced data collected from a large number of usersproduces rich knowledge for our society. However, it also brings unprecedentedprivacy threats to participants. Local privacy, a variant of differentialprivacy, is proposed as a means to eliminate the privacy concern.Unfortunately, achieving local privacy on high-dimensional crowdsourced dataraises great challenges on both efficiency and effectiveness. Here, based on EMand Lasso regression, we propose efficient multi-dimensional joint distributionestimation algorithms with local privacy. Then, we develop a Locallyprivacy-preserving high-dimensional data Publication algorithm, LoPub, bytaking advantage of our distribution estimation techniques. In particular, bothcorrelations and joint distribution among multiple attributes can be identifiedto reduce the dimension of crowdsourced data, thus achieving both efficiencyand effectiveness in locally private high-dimensional data publication.Extensive experiments on real-world datasets demonstrated that the efficiencyof our multivariate distribution estimation scheme and confirm theeffectiveness of our LoPub scheme in generating approximate datasets with localprivacy.
展开▼